Module 01

Data Science Friday

Installation check

Terminal Installation RStudio Installation GitHub Installation

RMarkdown pretty html challenge

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!

Origins and Earth Systems

Evidence Worksheet 01

Whitman et al 1998

Learning Objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?
    What is the population size, density, and distribution of prokaryotes on earth? What quantity of essential nutrients, specfically carbon, are held by prokaryotes?

  • What were the primary methodological approaches used?
    For aquatic and soil samples used direct counts. For unconsolidated sediments cited previous results. For terrestrial subsurface assumed porosity and volume of cell, and used groundwater data.

  • Summarize the main results or findings.
    Most prokaryotes reside in seawater, soil, and sediment/soil subsurface. The prokaryote population contains large amounts of carbon, nitrogen, phosphorus, and other essential nutrients. The large population size of prokaryotes also suggests that there exist many events occuring in nature that we do not yet understand.

  • Do new questions arise from the results?
    Prokaryotes are alarmingly abundant, play a role in important nutrient cycles, and experience a high mutation rate. This leads us to wonder what the true consequence of prokaryotic life on earth is.

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    The paper contained a significant amount of evidence, that was presented in a meaningful way. The assumptions and conclusions were justified using this evidence. However, not all experimental methods were adequately explained. The paper could also use more structure in order to increase readability.

Evidence Worksheet 02

Kasting et al 2002

Learning Objectives

Comment on the emergence of microbial life and the evolution of Earth systems.

Prompt

  • Indicate the key events in the evolution of Earth systems at tic marks on the time series.

4.6Ga - moon formation, oldest dated zircon minerals found

4.1Ga - late heavy bombardment

3.8Ga - geological evidence for life on earth; photosynthetic fractionation by Rubisco

3.0Ga - production of oxygen by cyanobacteria decreases greenhouse gas effect; life on land; glaciation

2.6Ga - moelcular fossil studies show cyanobacteria and eukaryotes present

2.3Ga - evidence of redbed rock formation

1.7Ga - begins with glaciation and the emergence of eukaryotes, ends with another global glaciation

1.0Ga - animals and land plants emerge; followed by gigantism

  • Describe the dominant physical and chemical characteristics of Earth systems at waypoints.

Haden - Water vapour is high in the atmosphere and lot’s is lost to space. Rock vapour also present in the atmosphere. Seawater chemistry is controlled by volcanism and reactions with debris. The sun is faint meaning there was either global glaciation or warmth on earth was sustained by the high greenhouse gas content.

Archaean - Life on earth is now widespread (chemotrophic life). Earth is hot and active with volcanism. Sulphate in hydrothermal systems provides an oxygenation power.

Proterozoic - There is a rapid rise in oxygen as a result of the rise in biological activity. New rock formations and mineral types arise.

Phanerozoic - Life on earth is visible. Land plants carry out photosynthesis. The Earth has assumed its present configuration.

Evidence Worksheet 03

Rockstrom et al 2009

Learning Objectives

Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General Questions:

  • What were the main questions being asked?
    Identify major earth system processes for which, if certain thresholds are crossed, a shift into a new system state could occur.

  • What were the primary methadological approaches used?
    This paper cited existing data instead of conducting an actual experiment.

  • Summarize the main results or findings.
    Conclude that there are nine earth processes which need to have defined boundaries which are largely interdependent (climate change, rate of biodiversity loss, interference with nitrogen and phosphorus cycles, stratospheric ozone depletion, ocean acidification, global freshwater use, change in land use, chemical pollution, atmospheric aerosol loading. There is evidence that three of these processes are already moving out of the Holocene state, since the current parameters are near or above the proposed boundaries.

  • Do new questions arise from the results?
    The authors themselves acknowledge some of the gaps in the proposition. For example, that they have been using some of their “first best guesses” in defining boundaries. They also acknowledge that the interactions between boundaries are also not understood.

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    The concept presented was unique and intriguing. There was evidence provided regarding the current states of planetary boundaries. However, the relationship between the state of these boundaries and the proposed shift into the proposed “Anthropocene” was not adequately justified. It wasn’t well explained how the authors came up with the boundaries, nor why reaching the boundaries was notable.

Problem Set 01

Whitman et al 1998

Learning Objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific Questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
    The primary prokaryotic habitats on Earth are aquatic, soil, and oceanic and terrestrial subsurface habitats. These contain 1.181x1020, 2.556x1029 and 3.8x1030 cells respectively.

  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
    Upper 200m of ocean contains total of 3.6x1028 cells. Of this, 2.9x1027 cells are marine cyanobacterium. This high ratio of cyanobacterium in the upper 200m of the ocean is significant because cyanobacterium are autotrophic, meaning they assimilate carbon and produce oxygen.

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
    An autotroph fixes inorganic carbon. Heterotrophs assimilate inorganic carbon. Lithotrophs use inorganic substrates.

  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
    The deepest habitat capable of supporting prokaryotic life is at 3000m-4000m deep. The primary limiting factor at this depth is temperature.

  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
    The highest habitat capable of supporting prokaryotic life is at 55-77km high (perhaps an overly optimistic estimate). The primary limiting factor is likely access to other organisms and nutrients, or ionizing radiation.

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
    Estimated vertical distance of Earth’s biosphere is 58-81km.

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
    Use estimates of prokaryotic carbon in environment to set limits for turnover rates. The population size multiplied by the yearly turnover should yield the annual cellular production.

For example:

(3.6x1028 cells) x (day/16) x (365day/yr) = 8.2x1029cells/year.

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
    Carbon assimilation will increase carbon content, and carbon turnover will decrease carbon content. Carbon content likely increases with depth in the ocean because detritus sinks. Assimilation efficiency and turnover rates probably decrease with depth because productivity decreases since sunlight is lacking, temperature decreases, and pressure increases. The amount of C in prokaryotes much higher in terrestrial versus marine habitats. However, the net primary productivity is higher in marine habitats.

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

Example calculation for upper 200m heterotroph:

(3.6x1028 cells)x(22.8 turnover/year) = 8.2x1029 cells/year

(4x10-7 mutations/generation)4 = 2.56x10-26 mutations/generation

(8.2x1029 cells/year)x(2.56x10-26 mutations/generation) = 2.1x104 mutations/year

(8.7x103)/(2.1x104 mutations/year) = 0.4 mutations/hour

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
    The large population size and high mutation rate results in an increase in genetic diversity and adaptive potential for prokaryotes. Some methods of diversification include point mutation, plasmid transfer, recombination, etc.

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
    High abundance an rapid replication result in a high mutation rate. High mutation rate causes an increase in diversity of prokaryotes which, in turn, increases the metabolic potential.

Problem Set 02

Falkowski et al 2008

Learning Objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
    Primary geophysical processes are based on tectonics and atmospheric photochemical processes. The high level of hydrogen in Earth’s atmosphere results in the majority of these abiotic processes being based on acid base chemistry. Acid base reactions supply and remove substances to create cycling of nutrients. The primary biogeochemical processes are abiotic acid base reactions and biotic redox reactions both based on cycling of H, C, N, O, S, and P. All the processes connect to create balance in elements and processes on earth.

  • Why is Earth’s redox state considered an emergent property?
    Earth’s redox state is considered an emergent property since it has emerged as a result of interactions between microbial metabolic redox reactions and abiotic geochemical processes.

  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
    Reversible electron transfer reactions of elements and nutrients are fundamentally globally interconnected. The biosphere attains a self-sustaining state in different ecological scales as a result of coupled nutrient cycling. In order to overcome thermodynamic barriers to reversible electron flow, microbial metabolic pathways have evolved to use products of one cycle as substrates in another.

  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox niches and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
    The nitrogen cycle partitions between different redox niches and microbial groups based on the availability of oxygen and organic matter. In the presence of oxygen, a specific group of Bacteria or Archaea perform the first oxygenation. The second oxidation is performed by a different group of nitrifying bacteria. In the absence of oxygen, a third set of microbes will use nitrogen dioxide and nitrate as electron acceptors to close the cycle. Nitrous oxide is potent greenhouse gas involved in the nitrogen cycle, which may effect climate change.

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
    The more microbial genomes we sequence, the more unique protein families we discover. It is argued that the vast number of genomic information discovered recently suggests a “limitless evolutionary diversity in nature”. Most of the genetic diversity observed is contained within nonessential genes and environment specific genes. Essential genes involved in metabolism do not display the same level of diversity.

  • On what basis do the authors consider microbes the guardians of metabolism?
    Microbes harbour core metabolic machinery that stands the test of time. They are purposed to “ferry” critical gene sets through extinction events, due to their inherent resilience.

Module 01 References (!!!)

Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320(5879):1034-1039. Science1153213

Kasting JF, and Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296(5570):1066-1068. Science1071184

Rockstrom et al. 2009. A safe operating space for humanity. Nature. 461:472-475. Nature461472

Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863